The Annals of Applied Probability 
2009, Vol. 19, No. 3, 1108-1142 
DOI: 10.1214/08-AAP569 

© Institute of Mathematical Statistics, 2009 



ASYMPTOTIC NORMALITY OF PLUG-IN LEVEL 
SET ESTIMATES 

By David M. Mason^ and Wolfgang Polonik^ 
University of Delaware and University of California, Davis 

We establish the asymptotic normahty of the G-measure of the 
symmetric difference between the level set and a plug-in-type esti- 
mator of it formed by replacing the density in the definition of the 
level set by a kernel density estimator. Our proof will highlight the 
efflcacy of Poissonization methods in the treatment of large sample 
theory problems of this kind. 

1. Introduction. Let / be a Lebesgue density on M'^, d>l. Define the 
level set of / at level c > as 

C{c)={x:fix)>c}. 

In this paper we are concerned with the estimation of C(c) for a given level 
c. Such level sets play a crucial role in various scientific fields, and their 
estimation has received significant recent interest in the fields of statistics 
and machine learning/pattern recognition (see below for more details). The- 
oretical research on this topic is mainly concerned with rates of convergence 
of level set estimators. While such results are interesting, they show only 
limited potential to be useful in practical applications. The available results 
do not permit statistical inference or making quantitative statements about 
the contour sets themselves. The contribution of this paper constitutes a 
significant step forward in this direction, since we establish the asymptotic 
normality of a class of level set estimators C„(c) formed by replacing / by 
a kernel density estimator in the definition of C(c), in a sense that we 
shall soon make precise. 
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Here is our setup. Let Xi,X2,... be i.i.d. with distribution function F 
and density /, and consider the kernel density estimator of / based on 
Xi,...,Xn, n>l, 

where K is a kernel and /i„ > is a smoothing parameter. Consider the 
plug-in estimator 

C„(c) ={x:fnix) > c}. 

Let G be a positive measure dominated by Lebesgue measure A. Our 
interest is to establish the asymptotic normality of 

dG{Cn{c),Cic)) ■. = G{Cn{c)AC{c)) 

(LI) 

= / IHfnix) > C} - I{f{x) > C}\ dGix), 

where AAB = {A\B)U {B\A) denotes the set-theoretic symmetric differ- 
ence of two sets. Of particular interest is G being the Lebesgue measure A, 
as well as G = H with H denoting the measure having Lebesgue density 
|/(x) — c|. The latter corresponds to the so-called excess-risk which is used 
frequently in the classification literature, that is, 

(L2) dH{Cn{c),G{c))= [ \f{x)-c\dx. 

JCn{c)AC{c) 

It is well known that under mild conditions dx{Cn{c),C{c)) — > in proba- 
bility as n ^ oo, and also rates of convergence have already been derived 
[cf. Bafllo, Cuevas and Justel (2000), Bai'llo, Cuestas-Albertos and Cuevas 
(2001), Cuevas, Febrero and Fraiman (2000), Bafllo (2003) and Bafllo and 
Cuevas (2006)]. Even more is known. Cadre (2006) derived assumptions un- 
der which for some fic > we have 

(1.3) v^dG{Cnic),Gic)) in probability as n ^ oo. 

However, asymptotic normality of dG{Cn{c),G{c)) has not yet been consid- 
ered. 

Our main result says that under suitable regularity conditions there exist 
a normalizing sequence {an^c} find a constant < ci^ < oo such that 

(1.4) an,G{dGiCnic),G{c)) - EdGiCnic),G{c))} acZ asn^oo, 

where Z denotes a standard normal random variable. In the important spe- 
cial cases of G = A the Lebesgue measure, and G = H we shall see that under 
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suitable regularity conditions 



(1.5) 




(1.6) 



= (n3/^n)^/^ 



respectively. 

In the next section we shall discuss further related work and relevant 
literature. In Section 2 we formulate our main result, provide some heuristics 
for its validity, discuss a possible statistical application and then present the 
proof of our result. We end Section 2 with an example and some proposals 
to estimate the limiting variance u'q. 

1.1. Related work and literature. Before we present our results in detail, 
we shall extend our overview of the literature on level set estimation to 
include regression level set estimation (with classification as a special case) 
as well as density level set estimation. 

Observe that there exists a close connection between level set estimation 
and binary classification. The optimal (Bayes) classifier corresponds to a 
level set C^(0) = {x : il^ix) > 0} of V' = p/ — (1 — p)g-, where / and g denote 
the Lebesgue densities of two underlying class distributions F and G and 
p G [0,1] defines the prior probability for /. If an observation X falls into 
{x:ip{x) > 0} then it is classified by the optimal classifier as coming from 
F, otherwise as coming from distribution G. Hall and Kang (2005) derive 
large sample results for this optimal classifier that are very closely related to 
Cadre's result (1.3). In fact, if Err(C) denotes the probability of a misclas- 
sification of a binary classifier given by a set C, then Hall and Kong derive 
rates of convergence results for the quantity Err(C'(0)) — Err(C^(0)) where 
C is the plug-in classifier given by (7(0) = {x:p/„(x) — (1 — p)gn{x) > 0} 
with /„ and Qn denoting the kernel estimators for / and g, respectively. It 
turns out that 



The latter quantity is of exactly the form (1.2). The only difference is, that 
the function is not a probability density, but a (weighted) difference of two 
probability densities. Similarly, the plug-in estimate is a weighted difference 
of kernel estimates. Though the results presented here do not directly apply 
to this situation, the methodology used to prove them can be adapted to it 
in a more or less straightforward manner. 

Hartigan (1975) introduced a notion of clustering via maximally con- 
nected components of density level sets. For more on this approach to clus- 
tering [see Stuetzle (2003)], and for an interesting application of this clus- 
tering approach to astronomical sky surveys refer to Jang (2006). Klemela 
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(2004, 2006a, 2008) applies a similar point of view to develop methods for 
visualizing multivariate density estimates. Goldenshluger and Zeevi (2004) 
use level set estimation in the context of the Hough transform, which is a 
well-known computer vision algorithm. Certain problems in flow cytometry 
involve the statistical problem of estimating a level set for a difference of two 
probability densities [Roederer and Hardy (2001); see also Wand (2005)]. 
Further relevant applications include detection of minefields based on arial 
observations, the analysis of seismic data, as well as certain issues in im- 
age segmentation] see Huo and Lu (2004) and references therein. Another 
application of level set estimation is anomaly detection or novelty detection. 
For instance, Theiler and Cai (2003) describe how level set estimation and 
anomaly detection go along in the context of multispectral image analysis, 
where anomalous locations (pixels) correspond to unusual spectral signa- 
tures in these images. Further areas of anomaly detection include intrusion 
detection [e.g., Fan et al. (2001) and Yeung and Chow (2002)], anomalous 
jet engine vibrations [e.g., Nairac et al. (1997), Desforges, Jacob and Cooper 
(1998) and King et al. (2002)] or medical imaging [e.g., Gerig, Jomier and 
Chakos (2001) and Prastawa et al. (2003)] and EEG-based seizure analysis 
[Gardner et al. (2006)]. For a recent review of this area see Markou and Singh 
(2003). 

The above list of applications of level set estimation clearly motivates the 
need to understand the statistical properties of level set estimators. For this 
reason there has been lot of recent investigation into this area. Relevant 
published work (not yet mentioned above) include Hartigan (1987), Polonik 
(1995), Cavalier (1997), Tsybakov (1997), Walther (1997), 
Bafllo, Cuevas and Justel (2000), Bai'llo, Cuestas-Albertos and Cuevas (2001), 
Cuevas, Febrero and Fraiman (2000), Bafllo (2003), Tsybakov (2004), Stein- 
wart, Hush and Scovel (2004, 2005), Gayraud and Rousseau (2005), Willett 
and Novak (2005, 2006), Cuevas, Gonzalez-Manteiga and Rodriguez-Casal 
(2006), Scott and Davenport (2006), Scott and Novak (2006), Vert and Vert 
(2006) and Rigollet and Vert (2008). 

Finally we mention a problem closely related to that of level set estima- 
tion. This is the problem of the estimation of the support of a density, when 
the support is assumed to be bounded. It turns out that the methods of es- 
timation and the techniques used to study the asymptotic properties of the 
estimator are very similar to those of level set estimation. Refer especially 
to Biau, Cadre and Pelletier (2008) and the references therein. 

2. Main result. The rates of convergence in our main result depend on 
a regularity parameter l/7g that describes the behavior of the slope of g at 
the boundary set f3{c) = {x : f{x) = c} [see assumption (G) below]. In 
the important special case of G = A the slope of g is zero, and this implies 
= (or 7(, = oo). For G = H our assumptions imply that the slope of 
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g close to the boundary is bounded away from zero and infinity wliidi says 
that l/^g = 1. 

Here is our main result. The indicated assumptions are quite technical to 
state and therefore for the sake of convenience they are formulated in Section 
2.4 below. In particular, the integer k> 1 that appears in the statement of 
our theorem is defined in (B.ii). 

Theorem 1. Under assumptions (D.i)-(D.ii), (K.i)-(K.ii), (G), (H) 
and (B.i)-(B.ii), we have as n —> 00 that 



The constant < aQ < 00 is defined as in (2.57) in the case d>2 and k = 1; 
as in (2.61) in the case d>2 and k > 2; and as in (2.62) in the case d=\ 
and k>2. (The case d = 1 and k = l cannot occur under our assumptions.) 

Remark 1. Write 



A slight extension of the proof of our theorem shows that if ci, . . . ,Cm, rn>l, 
are distinct positive numbers, each of which satisfies the assumptions of the 
theorem, then 



where Zi, . . . , Zm are independent standard normal random variables and 
(Ti, . . . , am are as defined in the proof of the theorem. 

Remark 2. In Section 2.7 we provide an example when the variance cr^ 
does have a closed form convenient for calculation. Such a closed form cannot 
be given in general, Section 2.7 also discusses some methods to estimate cr^ 
from the data. 

2.1. Heuristics. Before we continue with our exposition, we shall provide 
some heuristics to indicate why an = {-^)^^^ is the correct normalizing factor 
in (1.5), that is, we consider the case G = A, or 7^ = 00. This should help the 
reader to understand why our theorem is true. It is well known that under 
certain regularity conditions we have 




(2.2) 




6n{c) = an,G{dG{Cn{c), C{c)) - EdciCnic), C{c))}. 




V^{fnix)-fix)} = Op{l) 



as n 



00. 
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Therefore the boundary of the set C„(c) can be expected to fluctuate in a 
band B with a width (roughly) of the order Op{ } ) around the boundary 
set /3(c) = \x:f{x) = c}. For notational simplicity we shall write P = (3{c). 



Partitioning B by = Oi , } . ) = 0{—^) regions Rk, k = l,..., N, of 



Lebesgue measure X{Rk) = h^, we can approximate dx{Crt{c),C{c)) as 

N „ N 

dx{Cn{c),Cic))^Yl / \Hfn{x)>c}-I{fix)>c}\dx=:Y,Yn,k 
k=l''^k k=l 

Here we use the fact that the band B has width , ^, . Writing 

\/nhn 



yn,k= / An{x)dx 
JRi- 



'Rk 

with 

(2.3) A„(x) = \I{fn{x) > c} - I{f{x) >c}\, 

we see that 



Var(y„,fc)=/ / cov{An{x),Aniy))dxdy = 0{X{Rky) = 0{K), 
jRk JRk 

where the 0-terms turns out to be exact. Further, due to the nature of 
the kernel density estimator the variables Yn^k can be assumed to behave 
asymptotically like independent variables, since we can choose the regions 
Rk to be disjoint. Hence, the variance of dx{jCn{c),C{c)) can be expected 
to be of the order Nh"^ = (^)i/2^ which motivates the normalizing factor 



2.2. A connection to Lp-rates of convergence of kernel density estimates. 
The following discussion on Lp-rates, p > 1, of convergence of kernel density 
estimates implicitly provides another heuristic for our result. 

Consider the case G = -ffp_i, where Hp^i denotes the measure with Radon- 
Nikodym derivative hp-i{x) = \ f{x) — c\^~^ with p> 1. Note that H2 = H 
with H from above. Then we have the identity 

(2.4) r Hp^i{Cn{c)AC{c))dc = - [ \fnix)-f{x)\Pdx, p>l. 

Jo P JRrf 

The proof is straightforward [see Mason and Polonik (2008), Appendix, De- 
tail 1] . The case p = l gives the geometrically intuitive relation 

/•oo roo r r 

/ X{Cn{c)AC{c))dc= / dxdc= \fn{x)-f{x)\dx. 

Jo Jo JCn{c)AC{c) JR-i 
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Assuming / to be bounded, we split up the vertical axis into successive in- 
tervals A(A;), A; = 1, . . . , of length w , ^ with midpoints c^. Approximate 
the integral (2.4) by 

- / \fn{x)- f{x)\Pdx= r Hp_i{Cn{c)AC{c))dc 
P JR-i Jo 

N 

«V/ Hp.i{Cn{c)AC{c))dc 
JA(k) 



=1 

1 



A{k) 
N 



Y,Hp^l{Cn{ck)AC{ck)). 



k= 



Utilizing the l/i/n7i^-rate of fnix) we see that the last sum consists of 
(roughly) independent random variables. Assuming further that the variance 
of each (or of most) of these random variables is of the same order a~p = 

(^)~^/^(n/in)~^^~^^ [to obtain this, apply our theorem with jg = l/{p — 1)] 
we obtain that the variance of the sum is of the order 

an' _ r 1 



In other words, the normalizing factor of the Lp-norm of the kernel density 
estimator in can be expected to be {nhl ^'"^y/"^ = [nKy^^hn^^' . In 

— 1/2 1/2 

the case p = 2 this gives the normalizing factor nhnhn = nhn , and this 
coincides with the results from Rosenblatt (1975). In the special case d = 2 
these rates can also be found in Horvath (1991). 



2.3. Possible application to online testing. Suppose that when a certain 
industrial process is working properly it produces items, which may be con- 
sidered as i.i.d. R*^ random variables Xi,X2, . . . with a known density func- 
tion /. On the basis of a sample size n taken from time to time from the pro- 
duction we can measure the deviation of the sample Xi , X2 , ■ . ■ , Xn from the 
desired distribution by looking at the discrepancy between A(C„(c)AC(c)) 
and its expected value -EA(C„(c)AC(c)). The value c may be chosen so that 

P{XeC{c)]= f f{x)dx = a, 

some typical values being a = 0.90,0.95 and 0.99. We may decide to shut 
down the process and look for production errors if 

1 / n \ 

(2.5) 7- |A(C7„(c)AC(c)) - i?A(C„(c)AC(c))| > 1.96. 
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Otherwise as long as the estimated level set Cn{c) does not deviate too much 
from the target level set C (c) in which fraction a of the data should lie if the 
process is functioning properly, we do not disrupt production. Our central 
limit theorem tells us that for large enough sample sizes n the probability 
of the event in (2.5) would be around 0.05, should, in fact, the process be 
working as it should. Thus using this decision rule, we make a type I error 
with roughly probability 0.05 if we decide to stop production, when it is 
actually working fine. Sometimes one might want to replace the Cn{c) in 
the first A(C„(c)AC(c)) in (2.5) by C„(c„), where 

/ fn{x)dx = a. 

Hfn(x)>Cn} 

A mechanical engineering application where this approach seems to be of 
some value is described in Desforges, Jacob and Cooper (1998). This appli- 
cation considers gearbox fault data, the collection of which is described in 
that paper. In fact, two classes of data were collected, corresponding to two 
states: a gear in good condition and a gear in bad condition, respectively. 
Desforges, Jacob and Cooper indicate a data analysis approach based on 
kernel density estimation to recognize the faulty condition. The idea is to 
calculate a kernel density estimator gm based on the data Xi , . . . , from 
the gear in good condition, and then this estimator is evaluated at the data 
Yi,...,Yn that are sampled under a bad gear condition. Desforges, Jacob 
and Cooper then examine the level sets of Qm in which the faulty data lie. 
One of their ideas is to use ^ Z^ILi 5m(^i), to detect the faulty condition. 
Their methodology is ad hoc in nature and no statistical inference procedure 
is proposed. 

Our test procedure could be applied as follows by using f = Qm (i-e., we are 
conditioning on Xi, . . . , X„i). Set C(c) := {x : gm{x) > c}, for an appropriate 
value of c, and find the corresponding set Cn{c) based on Yi, . . . ,Yn. Then 
check whether (2.5) holds. If yes, then we can conclude with at significance 
level 0.05 that the Yi,...,!^ stem from a different distribution than the 
Xi, . . . , Xm- Observe that in this setup we calculate i?A(C„(c)AC(c)) as 
well as cr^ by using gm as the underlying distribution. (In practice we may 
have to estimate these two quantities; see Section 2.7.) How this approach 
would work in practice is the subject of a separate paper. 

2.4. Assumptions and notation. 

Assumptions on the density /. 

(D.i) / is in C^(M'^) and its partial derivatives of order 1 and 2 are 
bounded; 
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(D.ii) inf^gffid /(x) < c < sup^g^d f{x). 

Notice that (D.i) implies the existence of positive constants M and A 
with 

(2.6) sup/(x)<M<oo 

X 

and 



d d q2 



(2.7) ^EE^UP 



9^/(x) 



A<oo. 



[Condition (D.i) imphes that / is uniformly continuous on M'^ from which 
(2.6) follows.] 

Assumptions on K. 

(K.i) K is a probability density function having support contained in the 
closed ball of radius 1/2 centered at zero and is bounded by a constant k. 
(K.ii) j:f=iIm^UKit)dt = 0. 
Observe that (K.i) implies that 

(2.8) / \t\'^\K{t)\dt = Ki<oo. 

Assumptions on the boundary i3 = {x: f{x) = c} for d>2. 
(B.i) For ah (yi, . . . , ?/rf) G /?, 

w/ N N fdf{yi,...,yd) df{yi,...,yd)\ , 

f {y) = f iyi,---,yd) = ^^^^ j /o. 

Define 

r[0,27r), d = 2, 

^~ U0,7r]^-2 X [0,27r), ^ > 2. 

The d — 1 sphere 

5^-i={xgM'^:|x| = 1} 
can be parameterized by [e.g., see Lang (1997)] 

x{e) = {xi{e),...,xd{d)), oeid, 

where 

xi{9) = cos(6'i), 
X2{9) = sin(6'i) cos(02), 
xs{9) = sin(^i) sin(^2) cos(03), 
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Xd-i{0) = sm{6i] 
Xd{9) = sm{ei] 
(B.ii) We assume that the boundary (3 can be written as 



•sin(6'd_2)cos(6'rf_i), 
•sin(6'd_2)sin(0d-i)- 



with mi{\x - y\:x G (3j,y G (3i} > 0, if j / /, 



where each is diffeomorphic to S'^ ^, meaning it is parameterized by a 
function 

that is a function (depending on j) of the above parameterization x{9) of 
S'^~^, which is 1-1 on Jd, the interior of Id, with 

dy{9) _ fdyi{9) dyd{9) \ 

89,, 



89,, 



V d9i ' 

We further assume that for each j = 1, . . . ,k and i = 1 
is continuous and uniformly bounded on Jd- 

Assumptions on the boundary (3 for d=l. 
(B.i) inf =:po>0. 



G Jd- 



, d, the function 



(B.ii) (3 = {zu---,zk}, k>2 
[Condition (B.i) and / G 
when d = 1.1 



imply that the case k = 1 cannot occur 



Assumptions on G- (G) The measure G has a bounded continuous 
Radon-Nikodym derivative g w.r.t. Lebesgue measure A. There exists a con- 
stant < 7g < oo such that the following holds. 

In the case d>2 there exists a function g^^\-,-) bounded on Id x S'^^^ 
such that for each j = 1, - - - ,k, for some Cj > 0, 



sup sup 



0(1) 



as a 



,0, 



with < supi^i^;^ supgg/^ \g^^^9,z)\ < oo, where y{9) is the parametrization 
pertaining to Pj , with at least one of the Cj strictly positive. 

In the case d = l there exists a function g^^\-) with < |(7(^)(zj)| < oo, j = 
1, - - - ,k such that for each j = 1, . . . , A; for some Cj > 0, 

g{zj + az) 



sup 



c,g('\z,) 



:o(l) 



as a 



0, 
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with at least one of the Cj strictly positive. By convention, in the above 
statement = 0. 

Assumptions on hn- As n ^ oo, 

(H) \J nhl^'^^'^ — > 7, with < 7 < oo and n/i„/ logn — > 00, where 7 = in 
the case d= 1. 

Discussion of the assumptions and some implications. 

Discussion of assumption (G). Measures G of particular interest that 
satisfy assumption (G) are given by g{x) = \ f{x) — with p>0, and also 
by g{x) = f{x). The latter of course leads to the F-measure of the symmetric 
distance. The former has connections to the Lp-norm of the kernel density 
estimator (see the discussion in Section 2.2). As pointed out above in the 
Introduction, the choice p = 1 is closely connected to the excess risk from 
the classification literature. The choice p = yields the Lebesgue measure 
of the symmetric difference. 

Assumptions (B.i) and (D.i) imply that (G) holds for — with 

1/7^ = p. For g = f we have 1/7^ = [notice that by (D.ii) we have c > 0]. 

Discussion of smoothness assumptions on f . Our smoothness assump- 
tions on / imply that / has a 7-exponent with 7 = 1 at the level c, that is, 
we have 

F{x gR'^ :\f{x) - c\ < e} < Ce. 

[This fact follows from Lebesgue-Bosicovich theorem; e.g., see Cadre (2006).] 
This type of assumption is common in the literature of level set estimation. It 
was used first by Polonik (1995) in the context of density level set estimation. 

Implications of (B) in the case d>2. In the following we shall record 
some conventions and implications of assumption (B), which are needed in 
the proof of our theorem. Using the notation introduced in assumption (B), 

we define ^g-^ for points on the boundary of to be the limit taken from 

points in J^. In this way, we see that each vector is continuous and 

bounded on the closure of Id- 

Notice in the case d>2 that for each j = 1, . . . ,k and i = 1, . . . ,d — 1, 

(2Q) df{y{e)) _ df{y[e))dyi{e) d f {y[e)) dy^ _ r. 

^ ' ' de, dyi de^ ^ ^ dya dOi 

where y{6) is the parameterization pertaining to j3j. This implies that the 
unit vector 



(2.10) u{e) = {ui{e),...,ud{e)):-- 



\f\ym\ 
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is normal to the tangent space of f3j at y{0). 

From assumption (B.ii) we infer that f3 is compact, which when combined 
with (B.i) says that 



(2.11) 



{yi,---,yd)&p 



,yd) 



Po > 0. 



In turn, assumptions (D.ii), (B.i) and (B.ii), when combined with (2.11), 
imply that for each 1 < i < d — 1, the vector 

du{e) /dui{9) dud{ey 



is uniformly bounded on I^- 

Consider for each j = 1, . . . , A;, with y{d) being the parameterization per- 
taining to j3j, the absolute value of the determinant, 



(2.12) 





dy{e) 




801 


det 


dy{9) 








u{e) 



We can infer from (B.ii) that we have 
(2.13) 



sup l{6) < oo. 



2.5. Proof of Theorem 1 in the case d>2. We shall only present a de- 
tailed proof for the case k = \. However, at the end we shall describe how 
the proof of the general k>\ case goes. Thus for ease of notation we shall 
drop the subscript j in the above assumptions. Also we shall assume ci = 1 
in assumption (G). 

We shall first show that with a suitably defined sequence of centerings 6^, 
we have 

(2.14) {n/hnfl\V^nf''<^{dG{Cn{c),C{c)) - 6„} 4 aZ 

for some > 0. (For the sake of notational convenience, we write in the 
proof o"^ = (t'q-) From this result we shall infer that our central limit theorem 
(2.1) holds. The asymptotic variance cr^ will be defined in the course of the 
proof. It finally appears in (2.57) below. 

Theorem 1 of Einmahl and Mason (2005) implies that when hn satis- 
fies (H) and / is bounded that for some constant 71 > 



(2.15) limsupW- — ^ sup |/„(x) - £'/„(2;)| < 71, a.s. 
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It is not difficult to see tfiat under tlie assumptions (D), (K) and (H) for 
some 72 > 0, 



(2.16) sup y^nhn sup \Efn{x) - f{x)\ <"f 2. 

[See Mason and Polonik (2008), Appendix, Detail 2.] 
Set with ? > \/2 V 71 , 



(2.17) Er,= \x:\f{x)-c\<''^ 



^/nhn 

We see by (1.1), (2.15) and (2.16) that with probability 1 for all large enough 
n 

G(C„(c)AC(c)) = / \I{fn{x) > c} - I{f{x) > c}\g{x) dx 

J ll/n 

(2.18) 

= : L„(c). 

It turns out that rather than considering the truncated quantity Ln{c) di- 
rectly, it is more convenient to first study a Poissonized version of Ln{c) 
formed by replacing fn{x) by 



1 



7r„,{x) 



i=l 



where A'^^ is a mean n Poisson random variable independent of Xi , X2 , 

[When Nn = we set 7r„(x) = 0.] Notice that 

E-Knix) =Efn{x). 

We shall make repeated use of the fact following from the assumption that 
K has support contained in the closed ball of radius 1/2 centered at zero, 
that TTnix) and vr„(y) are independent whenever \x — y\> hj/'^. 

Here is the Poissonized version of Ln{c) that we shall treat first. Define 

(2.19) Unic)= [ \I{7rnix)>c}-I{fix)>c}\gix)dx. 

J En 

Our goal is to infer a central limit theorem for Ln{c) and thus for G(C„(c)AC(c)) 
from a central limit theorem for n„(c). 
Set 

(2.20) A„(x) = |/{7r„(x) > c} - I{f{x) > c}\. 

The first item on this agenda is to verify that {n/hny/^{\/nh^)^^'^^ is the 
correct sequence of norming constants. To do this we must analyze the exact 
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asymptotic behavior of the variance of n„(c). We see that 



Var(n„(c)) = Var ( [ A„(x) dG{x) 
\Je„ 

= f I coY{An{x),Aniy)) dGix)dG{y). 



Let 



Yn{x) 



■3<Ni 



X — Xj 

ll'n 



EK 



x-X 

ll'n 



x-X 

I In 



and Fi^^(x),...,yi"^(x) be i.i.d. 
Clearly 

VVar(7r„(x)) ' ^Var(7r„(x)) 



Set 

Cn{x) 



n 



n 



{■^n{x),7rniy))- 



VnK{c - Efn{x)) _ y/nK{c - J^a K{y)f{x - yh]!'^) dy) 



l/KEK^{{x-X)/hl!'') 



\lKEK\{x-X)lh\l'') 



Since K has support contained in the closed ball of radius 1/2 around zero, 
which implies that A„(x) and A„(y) are independent whenever |x — y| > 

Ihrl'^ ^ we have 

Var(^y" An{x)dG{x)^ 

l{\x -y\< Kl") cov(A„(x), A„(y)) dG{x) dG{y), 



y/nhn{c- f{x)) 



I En 'J En 

where now we write 



A„(x) 



/{7f„(x)>C„(x)}-/ 0> 



{\IKEK\{x-X)lh\l'')Yl'^ 



The change of variables ?/ = x + th^'^ , t £ B, with 

(2.21) B = {t:\t\<l}, 
gives 

(2.22) Varf / A„(x) dx j = /i„ / / gn{x,t)dtdx, 

\J Eji / J Eji J B 
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where 

gn{x,t) = lE„ix)lE„ix + thl/'^) cov(A„(x), A„(x + thj/'^)) 

(2.23) 

X g{x)g{x + thl/'^). 

For ease of notation let a„ = an^c = {j^Y^^W^^nY^'^^ ■ We intend to prove 
that 



(2.24) 



hm a^Varf / /^Jx)dG{x) 

= Hm aihn / / gn(x,t) dtdx 

= lim (n/i„) 1/2+1/79 / f g(x,t)dtdx 
Je„ Jb 

= Urn hm (n/i„)i/2+i/'^s / / qJx,t) dtdx =: a"^ < oo, 



where 



Dn{r) ■.= \z:z = y{e) + G \s\ < A. 

The set Dn{T) forms a band around the surface /? of thickness ^^"^ . 

Recah the definition of B in (2.21). Since /? is a closed submanifold of 
without boundary the tubular neighborhood theorem [see Theorem 11.4 on 
page 93 of Bredon (1993)] says that for all 5 > sufficiently small for each 
X & f3 + 6B there is an unique 6 G Id and \s\ < 6 such that x = y{9) + su{9). 
This, in turn, implies that for all 5 > sufficiently small 

{y{e)+su{e):eeld and \s\<5} = P + 6B. 
In particular, we see by using (H) that for all large enough n 

(2.25) Dn{T)= [3+ ^^B, where S = {z:|z| < 1}. 

Moreover, it says that x = y{9) + su{9), 9 £ Id and |s| < is a well-defined 
parameterization of f3 + 5B, and it validates the change of variables in the 
integrals below. 

We now turn to the proof of (2.24). Let 

Pn{x,X + thll'^) = CoY{^n{x),^n{x + tk]/'^)) 

h-'E[K{{x - X)/hi^)K{{x - X)/hlf + t)] 



hn^EK\{x - X)/h}f)hn^EK\{x - X)/h}f + t) 
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It is routine to show that for each 6 £ I^, \s\ <t,x = y{0) + and t £ B 
we have as n — > cx) that 

Pn{x, X + th}!'') = p„ (y(e) + ^MO) + ^ + thl/') - Pit), 

where 

^^^^ /iRd K{u)K{u + 1) du 
kdK^{u)du 

[See Mason and Polonik (2008), Appendix, Detail 4.] Notice that p{t) = 
p{—t). One can then infer by the central limit theorem that for each 9 £ 
Id, |s| ^ T and t G B, 

{7rn{x),7rn{x + thl/'^)) 

(2.26) ^(^„U) + !pS^).wJ m + + ml' A) 



A(Zi,p(t)Zi + ^l-p2(t)Z2), 

where Zi and Z2 are independent standard normal random variables. 

We also get by using our assumptions and straightforward Taylor expan- 
sions that for \s\ <t,u = su{9), x = y{6) + and 9 £ Id 

\Tlllri 

c„(x) = Cn{y{9) + —^=] 
\ Vnhn / 

Vnhnic - Efn{y{9) + u/ Vnhn)) 



(2.27) 



v/ l/hnEK^{{y{9) + u/V^ - X)/hl/'') 

f'{yiO))-u 



VTWT)\\Kh 
s\f'{y{0))\ 



V~c\\K\ 



■.c{s,9,0) 



and similarly since \/ nh}^'^^'^ — > 7, 



= :c{s,9,-ft). 
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We also have 

y/ nhn{c - f {x)) 



'l/KEKH{x-X)/hl/''^ 
and 

Vnhn{c - f{x + thl/''')) n- 



c(s,^,0) 



■^l/KEKmx-X)/hl!'') 



' c{s,6,-ft). 



[See Mason and Polonik (2008), Appendix, Detail 5.] Hence by (2.26) and 
(G) for y{9) G /3, 

{nhn)^/^'9n{x,t) 

= Ie^ (y{e) + Ie„ (y{e) + ^ + thl/" 

X cov f A„ (y{e) + , An (y{d) + ^ + thl/^ 

X {nhnYh^glyie) + -^^^('^(e) + + 

(2.28) 

™cov(|/{Zi >c(s,^,0)} -/{0> c(s,0,O)}|, 

\I{p{t)Z^ + v'l-p2(t)Z2 > c(s, ^,7t)} 
-/{O>c(s,0,7t)}|) 

x|.|V7.^(l)(g,^(g))|,^(g)+^t|V7.g(l)^ 

V |sM(6')+7f|y 

=:T{e,s,t). 
Using the change of variables 

(2.29) x, = ,,(^) + ^,...,x. = ,,(^) + £^, 

we get 

/ gnix,t)dtdx= r f f gn(y{0) + ^^,t)\Jn{e,s)\dtd9ds, 
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where 



(2.30) 



\Jnie,s)\ 



det 



dy{6) ^ 1 du{0) 



dyje) 1 du{e) 

u{e) 



Clearly, with l^O) as in (2.12), 
(2.31) 



VnK\Jn{0,s)\^L{e). 



Under our assumptions we have \/7ih^\Jn{6, s)\ is uniformly bounded in n > 
1 and {9, s) G 1^ x [— r, r]. Also by using (G) we see that for all n large enough 
{nhny/"'!> is bounded on x B. Thus since {nhnY^'^a g,^ and ^J nhn \ Jn \ are 
eventually bounded on the appropriate domains, and (2.28) and (2.31) hold, 
we get by the dominated convergence theorem and (G) that 

(n/l„)V2+l/7« / / g^(x,t)dtdx 

Jd„{t)Jb 

(2.32) = (n/i„)i/2+i/7. r f f gjy{e) + ^^,t)\ue,s)\dtdeds 

— > / / / T{6,s,t)i{6) dtdO ds asji^oo. 

J-rJldJB 

We claim that as r ^ oo we have 

r I I T{e,s,t)i{e)dtdeds 

J-T Ji^ Jb 

(2.33) 

^ / / T {9, s, t) L{e) dt dO ds =: a"^ <oo 

J —oo J J B 

and 

(2.34) lim limsup(n/i„)^/2+i/79 /■ [ g^{x,t)dtdx = 0, 

which in light of (2.32) implies that the limit in (2.24) is equal to o"^ as 
defined in (2.33). 

First we show (2.33). Consider 

r+(T):= r I I T{e,s,t)i{e)dtd9ds. 
Jo J Id Jb 
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We shall show existence and finiteness of the limit liniT-^oo r^(r). Similar 
arguments apply to 

lim r-fr) := lim T / / T{d, s,t)iid)dtd9ds <oo. 

Observe that when s > 0, 

\i{Zi > c(s, e, 0)} - /{o > c(s, e, o)}| = i{Zi < c(s, e, o)} 

and with <I> denoting the cdf of a standard normal distribution we write 

E{I{Zi <c{s,9,0)}) = ^c{s,9,0)). 

Hence by taking into account (2.28), the assumed finiteness of supj^i^i sup^f^^^^ {9, 
z), and using the elementary inequality 

|cov(X,y)| <2^|X|, whenever |y I < 1, 

we get for all s > and some ci > that 

(2.35) |r(0,s,t)| <ci|s|^/T«(|s|^/^« +7^/^«)$(c(s,e,0)). 

The lower bound (2.11) implies the existence of a constant c > such that 

*(c(M.o)) = 4(-il») 

Together with (2.35) and (2.13) it follows that for some c> we have 
lim r+(r) <c lim Isl^/'^'' (|s|i/79 + y/T9)(i _ $(cs)) < oo. 
Similarly, 

|s|l/79 (|s|l/79 j_ ^V79)(l _ $(cs)) ds^O 

as r — > OO. 

This validates claim (2.33). 

Next we turn to the proof of (2.34). Recall the definition of gn{x,t) in 
(2.23). Notice that for all n large enough, we have 

(n/i„) 1/2+1/7. / f g^{x,t)dtdx 

JDC(T)nE„ JB 



< / / \cOv{An{x),An{x + th}/'^))\ 
JDC(T)nEn JB 

(2.36) X {nhnf'^<>g{x)g{x + thll'^)dtdx 

< f I (Var(A„(x)))i/2 

(2.37) X {nhnf'^'g{x)g{x + th]l'^)dtdx. 
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The last inequality uses the fact that An{x + thl/'^) < 1 and thus Var(A„(x + 
thl/'^)) < 1. Applying the inequality 

(2.38) \I{a >b}-I{0>b}\< I{\a\ > \b\}, 

we obtain that 

Var(A„(x)) = Var(|/{7r„(x) > c} - I{f{x) > c}\) 
<E{I{Mx)-f{x)\>\c-f{x)\}f 
= P{Mx)-fix)\>\c-f{x)\}. 

Thus we get that 



^Yav{An{x)){nhn)^^"''9{x)g{x + thl/'^)dtdx 
V'p{|vr„(x)-/(x)|>|c-/(x)|} 



/DC'(r)n£;„ Jb 

(2.39) < / 

JDC{T)nEn Jb 

X {nhn)^^^'g{x)g{x + th}J'^)dtdx. 

We must bound the probability inside the integral. For this purpose we need 
a lemma. 

Lemma 2.1. Let Y,Yi,Y2, . ■ ■ he i.i.d. with mean /x and bounded by < 
M < oo. Independent ofYi,Y2, ... let Nn be a Poisson random variable with 
mean n. For any v > 2(e30)^£'y^ and with d = e30M we have for all A > 0, 



(2.40) 



-n/_f > A [ <exp 



AV2 
nv + dX 



Proof. Let be a Poisson random variable with mean 1 independent 
of Yi,Y2, . . . and let 



TV 



i=l 



Clearly if wi, . . . , are i.i.d. then 

N„ n 

1=1 1=1 

Our aim is to use Bernstein's inequality to prove (2.40). Notice that for any 
integer r > 2, 

N 



(2.41) 



E\u;-n\'^ = E 



i=l 
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At this point we need the following fact, which is Lemma 2.3 of Gine, Mason 
and Zaitsev (2003). 

Fact 1. If, for each n> 1, Ci, C2> • • • , Cni • • • i are independent identi- 
cally distributed random variables, = 0, and r/ is a Poisson random vari- 
able with mean 7 > and independent of the variables {Oli^i; then, for 
every p > 2, 



(2.42) 



E 



1 

1=0 



< 



15p 
logp 



max 



[{^Eefi^^Em 



Applying inequality (2.42) to (2.41) gives for r > 2 

N 



E\uj-^i\'' = E 



i=l 



< 



15r 
logr 



ui&-x[{EYy'^,E\Y\'']. 



Now 



m.ax[{EY'^Y''^,E\Y\'] < max[{EY'^){EYy/^-\ {EY^)M 



r-2i 



< ey'^m"'-'^. 

Moreover, since log 2 > 1/2, we get 

^|u;-^r<(30r)^^y2^^-2_ 

By Stirling's formula [see page 864 of Shorack and Wellner (1986)] 

r < e r\. 

Thus 



E\uj - < {eSOrfEY-'M''-'' < 



2,..-2/2(e30)2i?F2 



r!(e30M)^-2 < -rld'"^, 



where v > 2(e30)^£'y^ and d = e30M. Thus by Bernstein's inequality [see 
page 855 of Shorack and Wellner (1986)] we get (2.40). 



Here is how Lemma 2.1 is used. Let Yi = i/d )- Since by assumption 

both K and / are bounded, and K has support contained in the closed ball 
of radius 1/2 around zero, we obtain that for some Dq > and all n > 1, 



sup E 



K 



x-X 

ll'Tl 



< Dnh 



Consider z > a/y/nhn for some a > 0. With this choice, and since sup^ \Efn{x) — 
f{x)\< Aih'^/'^ < for n large enough by using (H) [see Mason and Polonik 
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(2008), Appendix, Detail 2], we have 

PiMx) - fix) >z} = P{Trn{x) - EUx) > z - (EUx) - fix))} 

< P|x„(x) - Efr^x) > Z - l^^] 

L 2 ^/nhn J 

<pUn{x)-Efnix)>^ 



for z > a/\/nhn and n large enough. We get then from inequality (2.40) that 
for n > 1, all z > that for some constants Di and D2 

p{.(.,-./..(.,.|}^p{|.(^)-n..(i^).!%i) 

/ {nhn)^z^ \ ( nhnZ^ 

< exp — — — ; — — ; — = exp 



Dinhn + D2nhnZ J \ D1 + D2Z 

We see that for some a > for all z > aj^Jnhn and n large enough, 

nhnZ^ I—— 

> y/nhnZ. 



Di + D2Z 



Observe that for < 2; < a/\Jnh^ 



exp (a) exp( — v^n/i^z) > exp (a) exp (—a) = 1 > P{7r„(x) — f{x)> z}. 
Therefore by setting A = exp (a) we get for all large enough n > 1, z > and 

X, 

P{TTn{x) - f{x) >z}< Aey.-p{-VnKz). 
In the same way, for all large enough n > 1, z > and x, 



P{'Kn{x) - fix) < -z} < Aexp{-y/nhnz). 
Notice these inequalities imply that for all large enough n > 1, z > and x, 

f y/nhn\c - f{x)\ 



(2.43) ^P{\7Tn{x)-f{x)\>\c-f{x)\}<VAe^pi^ 



Returning to the proof of (2.34), from (2.36), (2.37), (2.39) and (2.43) we 
get that for all large enough n > 1, 



(n/i„)^/^+^/T« / / gn{x,t)dtdx 

JDC(T)r\En JB 

<Vnh'n^l I e-^^\''-f'-^^\^^{nhn)^/"'^g{x)g{x + thl/'^)dtdx, 

JDC{T)nE„ Jb 
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which equals 



where 



V^nix) = V Aexp 



Our assumptions imply that for some < ?] < 1 for all 1 < |s| < <j-y/log n and 
n large 



y/nhr, 



>v\s\- 



[See Mason and Polonik (2008), Appendix, Detail 3.] We get using the change 
of variables (2.29) that for all r > 1, 



/ 'Pnix)dx = / / 

JDC{T)nE„ Jr<|s|<^^/losn J/, 



su{9) 
Vnhn 



\Jn{9,s)\deds. 



Thus, by our assumptions [refer to the remarks after (2.31) and assumption 
(G)] there exists a constant C > 0, such that for all large enough r and n 



LPn{x)dx 



< 



C 



X 



T<\s\<';WlognJ la 



1 2/79 



< 



c 



Vnhn Jt<\s\< 



X exp 
exp 



Vnhn\c- f{y{6) + su{e) / y/nhn)\ 



dOds 



vf-v/logn J Id 



rj\s\ 



d9 ds. 



Thus 



(2.44) / / Mx)dx< 

JDC{T)nEn Jb 



47r'^-iCexp(-7?r/2) 



rjVnhn 

Therefore after inserting all of the above bounds we get that 

47r'^-iCexp(-r?r/2) 



(n/i„)i/2+i/7« 



gn{x,t) dx dt < 



lDC{T)nE„ Jb 
and hence we readily conclude that (2.34) holds. 



V 
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Putting everything together we get that as n — > oo, 

(2.45) a2Var(n„(c))^a2 

with fj^ defined as in (2.33). For future use, we point out that we can infer 
by (2.24), (2.34) and (2.45) that for all e > there exist a tq and an no > 1 
such that for all r > tq and n > uq 

(2.46) \al{T)-a^\<e, 
where 

(2.47) al{T)=YaT(^(^^y\v^)'^^' A„(x)5(x) dx) . 

Our next goal is to de-Poissonize by applying the following version of a 
theorem in Beirlant and Mason (1995). 

Lemma 2.2. Let Ni^n o,nd N2^nbe independent Poisson random variables 
with Ni^n being Poisson{nf3n) and N2,n being Poisson{n{l — fin)) where (5n G 
(0, 1). Denote Nn = Ni^n + -^2,n and set 

Un = — ■ — 1= ana Vn — 



In 

Let {Sn}'^=i be a sequence of random variables such that: 

(i) for each n> 1, the random vector {Sn,Un) is independent ofV^, 

(ii) for some < oo, Sn oZ, as oo, 

(iii) /3„ — > 0, as oo. 

Then, for all x, 

P{Sn <x\Nn = n} ^ P{aZ < x}. 

The proof follows along the same lines as Lemma 2.4 in Beirlant and Mason 
(1995). [See Mason and Polonik (2008), Appendix, Detail 6.] 

We shall now use this de-Poissonization lemma to complete the proof of 
our theorem. Recall the definitions of Ln{c) and n„(c) in (2.18) and (2.19), 
respectively. Noting that Dn{T) C En for all large enough n > 1, we see that 

an{Ln{c) - EIin{c)) 

= an f mfnix) >c}- L{f{x) > c}| - E^n{x)}g{x) dx 

JDnir) 

+ an [ mUx) >c}- L{fix) > c}\ - EAn{x)}g{x) dx 

= :Tn{T)+Rn{T). 
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We can control the Rnij) piece of this sum using the inequahty, which 
fohows from Lemma 2.3 below, 

E{Rn{T)f 

(2.48) <2a2Varf/ \I{T,n{x)>c}-I{f{x)>c}\g{x)dx\ 

= 2(n/l„)l/2+l/7« j f g^{x,t)dtdx, 

JDC{T)r\E,, JB 

which goes to zero as n ^ oo and r — > oo as we proved in (2.34). 

The needed inequality is a special case of the following result in Gine, 
Mason and Zaitsev (2003). We say that a set D is a (commutative) semigroup 
if it has a commutative and associative operation, in our case sum, with 
a zero element. If D is equipped with a u-algebra T) for which the sum, 
+ : (D X (g) 2?) 1— > (D,T>), is measurable, then we say the (D,T>) is a 
measurable semigroup. 

Lemma 2.3. Let (D,!)) be a measurable semigroup; letYQ = 0£D and 
let Yi, i S N, be independent identically distributed D -valued random vari- 
ables; for any given n € N, let r] be a Poisson random variable with mean n 
independent of the sequence \Yi}; and let B be such that P{Yi G B} < 
1/2. IfG:D^Ti is nonnegative and T> -measurable, then 

(2.49) EG (j2 HYi G B)Y^ < 2EG (j2 1(Y^ G B)Y}j . 

Next we consider T„(r). Observe that 

(2.50) (5„(^)|iv„ = n)^^, 

where as above A^^^ denotes a Poisson random variable with mean n, 

a'nSD^(r){^n{x) - El:^n{x)}g{x)dx 

and cr^(T) is defined as in (2.47). We shall apply Lemma 2.2 to Snir) with 

Nl,n = J2 e ^n(r + V^hn)}, 
i=l 

N2,n = E ^ ^n(r + V^hn)} 
1=1 
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and 

(3n = P{Xi£Dr,iT + V^hn)}. 

We first need to verify tliat as n — > co 

Snir) = '-^ — >Z. 

To show this we require the fohowing special case of Theorem 1 of Shergin 
(1990). 

Fact 2 [Shergin (1990)]. Let „ : i G Z"^} denote a triangular array of 
mean zero m-dependent random fields, and let Jn C Z*^ be such that: 

(i) Var(X;iej-„ ^i,n) ^ 1, as n ^ cx), and 

(ii) for some 2 < s < 3, X^ieJn — > 0, as n ^ oo. 

Then 

where Z is a standard normal random variable. 

We use Shergin's result as follows. Under our regularity conditions, for 
each T > there exist positive constants di,...,d^ such that for all large 
enough n, 

di 



(2.51) \Dn{T)\ < 



(2.52) d2<(J„(r)<d3. 

Clearly (2.52) follows from (2.46), and it is not difficult to see (2.51). For 
details see Mason and Polonik (2008), Appendix, Detail 7. There it is also 
shown that for each such integer n > 1 there exists a partition {Ri,i G JnC 
Z'^} of Dnir) such that for each i € J7n 

(2.53) \Ri\<d4hn, 
where 

(2.54) |j„|=:^„<^L_. 
Define 

_anJji.{An{x) - EAn{x)}g{x)dx 
o-n(r) 
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It is straightforward to see that Xi^n can be extended to a 1-dependent 
random field on Z*^. [See Mason and Polonik (2008), Appendix, Detail 7.] 
Notice that by (G) there exists a constant A > such that for all x £ 

\9{x)\<A{V^^)-'/''^. 
Recalling that a„ = a^,© = i^)^^'^iV^nJi^)^/'^o we thus obtain for all for i G 

an2A\Ri\A{^/^y^/^^ 



Therefore, 



This bound when combined with (H) imphes that as n ^ oo, 
which by the Shergin fact (with s = 5/2) yields 

Thus, using (2.50) and /?„ = P{Xi G Dn{T + ^/nhn)} 0, Lemma 2.2 implies 
that 

(2.55) ^ Z. 

Putting everything together we get from (2.48) that 
lim limsup^(i?„(r))^ = 

n—*oo 

and from (2.46) that 

lim limsup |(T^(r) — cr^l = 0, 
which in combination with (2.55) implies that 

(2.56) a„(L„(c) - EUn{c)) ^ aZ, 



28 D. M. MASON AND W. POLONIK 

where 

/ / V{9,s,t)L{9)dtdeds 

-oo Jli Jb 

with T{9,s,t) as defined in (2.28). Since by Lemma 2.3 

E{an{Ln{c) - EIln{c))f < 2 Var (ann„(c)) 

and 

Var(a„n„(c)) a'^ < oo, 

we can conclude that 

a„(^L„(c)-En„(c))^0 

and thus 
This gives that 

an(G{Cn{c)AC{c))- f E\I{fn{x)>c}-I{f{x)>c}\dG{x) 



(2.58) 



d ry 



which is (2.14). In fight of (2.58) and keeping mind that 

EG{^n{c)AC{c)) = / E\l{U{x) > c} - /{/(x) > c}\g(x) dx, 

we see that to complete the proof of (2.1) it remains to show that 

(2.59) anE [ \I{fnix) > c} - /{/(x) > c}|g(x) dx ^ 0. 

We shall begin by bounding 

E\I{U{x)>c}-I{f{x)>c}\, x^El 

Applying inequality (2.38) with a = fn{x) — f{x) and b = c — f{x) we have 
for X £ E^, 

E\I{U{x)>c]-I{f{x)>c]\ 

<EI{\Ux) - f{x)\>\c- f{x)\} 
= P{\Ux)-f{x)\>\c-f{x)\} 

< P{\Ux) - EUx)\ >\c- f{x)\ - \f{x) - EUx)\}. 
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By recalling the definition of £"„ in (2.17) we obtain 
E\I{U{x)>c}-I{f{x)>c}\ 

< P||/„(x) - EUx)\ > '-^^0^ - \f{x) - Efn{x)\'^ 

< p[\Ux) - EUx)\ > - A,hl/^]. 

The last inequality uses the fact that (K.i), (K.ii), (2.7) and (2.8) imply 
after a change of variables and an application of Taylor's formula for /(x + 

hl/'^v) — f{x) that for some constant Ai > 0, 

sup/i;^/'^ sup \Efn{x) - f{x)\ < Ai. 

n>2 



Thus for all large enough n uniformly in x G E^, 

E\I{fn{x)>c}-I{f{x)>c}\ 

<F{|/„W-£/„(.)|>|2i|f}.:,„(,,. 

where the last inequality uses (H). We shall bound Pn{x) using Bernstein's 
inequality on the i.i.d. sum 



1=1 

Notice that for each i = 1, . . . , n 



K\u)f{x-h]/''u)du<^^^^ 



n^hn Jr^ " ~ in?hr. 

and by (K.i), 

2k 



nhr. 



< 



nhr. 



Therefore by Bernstein's inequality [i.e., page 855 of Shorack and Wellner 
(1986)], 

-<j^(logn)/(4n/i„) 



Pn{x) < 2exp 



||i^||iM/(n/in) + 2/3^(logn)V2/(2(n/i„)V2)^/(n/i„ 
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-?2(logn)/4 



2exp 



\K\\lM + K?(logn)V2/(3(n/i„)i/2 



Hence by (H) and keeping in mind that > \/2 in (2.17), we get for some 
constant a > that for all large enough n, uniformly in x £ E^, we have the 
bound 

(2.60) p„(x) < 2exp(— (;"alogn). 

We shall show below that A(C„(c)AC(c)) < jtt, < cxo for some < m < oo. 
Assuming this to be true, we have the following [similar lines of arguments 
are used in Rigollet and Vert (2008)] 

E f \I{fn{x) >c}- I{f{x) > c}\g{x) dx 

= E f \I{fn{x)>c}-I{f{x)>c}\g{x)dx 
< sup Ef \I{Ux)>c}-I{f{x)>c}g{x)\dx 

A:\(A)<m JES,r\A 



< sup / E\I{Ux)>c}-I{f{x)>c}\g{x)dx 

A:\{A)<mJE^nA 

<msupg{x) sup E\I{fn{x) > c} - I{f{x) > c}\ 

X xdE^ 

<msup5((x) sup Pn{x)- 
With Co = insup^ g{x) and (2.60) this gives the bound 

ClnE f \I{fn{x) > C} - I{f{x) > c}\g{x) dx 

Je- 

< 2coa„ exp(— (jalogn). 

Clearly by (H), we see that for large enough <; > 

an exp(— <;"alogn) — > 

and thus (2.59) follows. It remains to verify that there exists < m < oo 
with 

A(C„(c)AC(c)) <m. 

Notice that 



1> / fn{x)dx>cX{Cn{c)) 
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and 



Thus 



1 > 



C{c) 



f{x)dx>c\{C{c)). 



A(Cn(c)AC(c)) <2/c=:m. 

We see now that the proof of the theorem in the case k = 1 and d > 2 is 
complete. 

The proof for the case A; > 2 goes through by an obvious extension of the 
argument used in the case k = \. On account of (B.ii) we can write for large 
enough n 

j Anix) dG{x) = J2 [ ^n{x)dGix), 

where the sets Ej^n, j = ^, ■ ■ ■ ,k, are disjoint and constructed from the f3j just 
as En was formed from the boundary set /3 in the proof for the case k = 1. 
Therefore by reason of the Poissonization, the summands are independent. 
Hence the asymptotic normality readily follows as before, where the limiting 
variance in (2.1) becomes 

(2.61) ^'=EsM' 



i=l 



where each (t| is formed just like (2.57). 

2.6. Proof of the theorem in the case d=l. The case d=l follows along 
very similar ideas as presented above in the case d>2 and is in fact some- 
what simpler than the case d>2. We therefore skip all the details and only 
point out that by assumption (B.ii) the boundary set /? = {x S M : f{x) = c} 



consists of k points Zi 



l,...,k. Therefore, the integral over 9 in the 



definition of a in (2.57) has to be replaced by a sum, leading to 

^ roc rl 

(2.62) :=Y,{g('\zi)f / T{i,s,t)\sf/^^ dtds, 

where 

T{i,s,t) 



cov 



sf'jz, 
V~c\\K\ 



Ii0> 



I\pit)Z^ + Jl-p^{t)Z2> 



Vc\\Kh 
sf'izi) 



Il0> 



sf'jz-, 

We can drop the absolute value sign on f'{zi) in our definition of T{i,s,t) 
for i = 1, . . . ,k and thus cr^, since p{t) = p{—t). □ 
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2.7. Remarks on the variance and its estimation. Clearly the variance 
Uq that appears in Theorem 1 does not have a nice closed form and in many 
situations is not feasible to calculate. Therefore in applications cx^ will very 
likely have to be estimated either by simulation or from the data itself. In 
the latter case, an obvious suggestion is to apply the bootstrap and another 
is to use the jackknife. A separate investigation is required to verify that 
these methods work in this setup. [Similarly we may also need to estimate 
EGiCn{c)AC{c)).] 

Here is a quick and dirty way to estimate a^. Let Xi, . . . ,Xn be i.i.d. /. 
Choose a sequence of integers 1 < < n, such that m„ oo and n/run 
oo. Set c^n = [iT'/mn] and take a random sample of the data Xi, . . . ,X„ of 
size mn<in and then randomly divide this sample into disjoint samples of 
size m„. Let 

C, = dG{C^^„{c)AC{c)) fori = 

where Cmn(c) is formed from sample i. We propose as our estimator of a^, 
the sample variance of {-^^y^^d, i = 1, . . . ,<in, 

/ X 1/2 ?n 

(f^j Ete-?)V(?n-l)- 

Under suitable regularity conditions it is routine to show that this is a 
consistent estimator of a^, again the details are beyond the scope of this 
paper. 

The variance g'q under a hivariate normal model. In order to obtain a 
better understanding about the expression of the variance we consider it in 
the following simple bivariate normal example. Assume 

/(x, y) = i- exp (-^^) , {x, y)e^\ 

A special case of Theorem 1 says that whenever 

(2.63) and nhn/ logn —>■ oo 
(here 7 = 0), then 

/ n \ '^/'^ w 

(2.64) i^—j {A(C„(c)AC(c)) - i?A(C„(c)AC(c))} A axZ. 

We shall calculate a\ in this case. We get that 

f'{x,y) = -{x,y)f{x,y). 

Notice for any < c < 2^ , 

P = {{x,y):x^+y^ = -2\og{c27r)}. 
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Setting 



r(c) = J-21og(c27r), 



we see that (3 is the circle with center and radius r(c). Choosing the obvious 
differmorphism, 

y{d) = (r(c) COS e, r(c) sin 6*) for 9 G [0, 27r], 

we get that for 9 G [0, 2tt], 

u{9) = (— cos^, — sin6'), y'(^) = {f{c) sin^, — r(c) cos^) 

and 



i{9) 



det 



r(c) sin 6* — r(c) cos 9 
— cos 9 — sin 9 



r{c). 



Here g = 1 and we are assuming 7 = 0. We get that 

s\nyi9))\_ sr{c)^c 



c{s,9,-ft) = c{s,9,0) 



Vc\\K\\ 



\K\\ 



Thus 

r{9,s,t) 



cov 



> 



ll^lb J 



iio> 



'ich/c\ 

\K\\2 J 



This gives 



al = r(c) 



00 i'2tt 



JB 



T{9,s,t)dtd9ds. 



Set 



T{9, u, t) = cov(|/{Zi > -u} - /{O > -n}| , 



\I{pit)Zi + ^1 - p2(t)Z2 > -u} - I{0 > -u}\). 
We see then by the change of variables u = 



-I 



\K\\ 



00 /•27r 



T{9,u,t) dtd9 du. 

For comparison, Theorem 2.1 of Cadre (2006) says that if 
(2.65) n/i„/(logri)^^ ^ 00 and ?i/i^(logn)2 ^ 0, 
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then 

(2.66) V^X{Cn{c)AC{c)) ^ \\K\\2x[^ f = 2\\K\\2 

V TT 7^ IIV/II 

The measure d7i denotes the Hausdorff measure on p. In this case TCiP) is 
the circumference of /?. 

Observe that since VnK{t^)^/^ = {nhlf/'^hll'^ 0, (2.64) and (2.66) 
imply that whenever (2.63) and (2.65) hold, we get 

y/nKE\{Cn{c)AC{c)) 2\\K\\2 
[Notice that the choice hn = l/(-v/nlogn) satisfies both (2.63) and (2.65).] 
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